Entity Resolution Acceleration using Micron’s Automata Processor

نویسندگان

  • Chunkun Bo
  • Ke Wang
  • Jeffrey J. Fox
  • Kevin Skadron
چکیده

Entity Resolution (ER), the process of finding identical entities across different databases, is critical to many information integration applications. As sizes of databases explode in the big-data era, it becomes computationally expensive to recognize identical entities for all possible records with variations allowed. Profiling results show that approximate matching is the primary bottleneck. Micron’s Automata Processor (AP), an efficient and scalable semiconductor architecture for parallel automata processing, provides a new opportunity for hardware acceleration for ER. We propose an AP-accelerated ER solution, which accelerates the performance bottleneck of fuzzy matching for similar but potentially inexactly-matched names, and use a real-world application to illustrate its effectiveness. Results show 121x to 4200x speedups for matching one record, with better accuracy (7.6% more correct pairs and 39% less generalized merge distance cost) over the existing CPU method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Subset Encoding: Increasing Pattern Density for Finite Automata

Micron’s Automata Processor is an innovative reconfigurable hardware accelerator for parallel finite-automatabased regular-expression matching. While the Automata Processor has demonstrated potential for many pattern matching applications, other applications receive reduced benefit from the architecture due to capacity limitations or routing limitations. In this paper, we present an efficient i...

متن کامل

Uses for Random and Stochastic Input on Micron’s Automata Processor

Micron’s Automata Processor (AP) is a configurable memorybased device, purpose-built to emulate a theoretical nondeterministic finite automata (NFA). While NFAs are not particularly suited for floating point computation, they are extremely powerful and efficient pattern matchers and have been shown to provide large speedups over traditional von Neumann execution for rule-based, data-mining appl...

متن کامل

Fast Searching for Potential gRNA Off-Target Sites for CRISPR/Cas9 using Automata Processing

The CRISPR/Cas9 system is a bacteria immune system protecting cells from foreign genetic elements and has been modified to edit genomes in targeted locations. However, the risk of binding at off-target locations limits its power and thus it is necessary to identify these off-targets. Finding offtargets is computationally expensive, especially when allowing several mismatches. We present the CRI...

متن کامل

Fast Cellular Automata Implementation on Graphic Processor Unit (GPU) for Salt and Pepper Noise Removal

Noise removal operation is commonly applied as pre-processing step before subsequent image processing tasks due to the occurrence of noise during acquisition or transmission process. A common problem in imaging systems by using CMOS or CCD sensors is appearance of  the salt and pepper noise. This paper presents Cellular Automata (CA) framework for noise removal of distorted image by the salt an...

متن کامل

Cellular Automata on the Micron Automata Processor

A cellular automaton (CA) is a well-studied and widely used time-evolving discrete model. CAs are studied in many fields of science, such as computability theory, mathematics, physics, complexity science, theoretical biology and microstructure modeling. Some CA models have been proven to be Turing Complete, such as the elementary cellular automaton (ECA) of Rule-110 and Conways Game of Life. Mi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015